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The output generated by photovoltaic arrays is influenced mainly by the 
irradiance, which has non-uniform distribution in a day. This has resulted in 
the current-limiting nature and nonlinear output characteristics, and 
conventional protection devices cannot detect and clean faults appropriately. 
This paper proposes a low-cost model for a multi-scale dual-stage 
photovoltaic fault detection, classification, and monitoring technique 
developed through MATLAB/Simulink. The main contribution of this paper 
is that it can detect multiple common faults, be applied on multi-scale 
photovoltaic arrays regardless of environmental conditions, and be beneficial 
for photovoltaic system maintenance work. The experimental results show 
that the developed algorithm using supervised learning algorithms mutual 
with k-fold cross-validation has produced good performances in identifying 
six common faults of photovoltaic arrays, achieved 100% accuracy in fault 
detection, and achieved good accuracy in fault classification. Challenges and 


suggestions for future research direction are also suggested in this paper. 
Overall, this study shall provide researchers and policymakers with a 
valuable reference for developing photovoltaic system fault detection and 
monitoring techniques for better feasibility, safety, and energy sustainability. 
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1. INTRODUCTION 

Globally, power generation from solar photovoltaic (PV) systems is experiencing a significant 
increase [1]. This increase has also led to risks associated with damage to PV system components, injury to 
operators, and fire hazards to PV systems and buildings. Since PV output is nonlinear, conventional 
protection devices (CPD) such as fuses and circuit breakers can detect faults and isolate faulty circuits only at 
large fault currents and voltages. Therefore, better fault detection and monitoring techniques for PV systems 
are needed [2] for better feasibility, safety, and energy sustainability. Recent studies have developed 
advanced or intelligent fault detection and monitoring techniques for solar PV systems. The main ones are 
model-based and IV power loss curve approaches, machine learning techniques, statistics-based techniques, 
and output signal analysis techniques [3]. 

The model-based approach for detecting and identifying PV faults compares the expected data 
obtained from the simulation process with data measured from an experiment or data collected from a PV 
system [4]-[7]. This technique involves the least integration complexity with PV systems and requires low 
implementation costs. However, most studies have found that the accuracy obtained is lower than other 
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advanced PV fault detection methods. Machine learning (ML) techniques, on the other hand, have been the 
most favorable method for detecting and diagnosing PV systems faults. This approach exploits artificial 
intelligence with three main algorithms; supervised learning, semi-supervised learning, and unsupervised 
learning in task completion [8]-[15]. Studies have proven this technique acquires high accuracy. Still, the 
need for data acquisition systems and advanced computing system skills has made it complex and 
challenging to integrate with PV systems and expensive to implement. 

Meanwhile, statistical-based analysis mostly sets a threshold value and compares it with the actual 
value measured in determining a PV system's normal or faulty state [13]-[16]. Earlier studies indicated that 
approaches using mean differences or variances have better abilities in determining errors in the PV system. 
Though, incorrectly setting the threshold limit can reduce method accurateness. Lastly, the output signal 
analysis using the frequency-time domain to detect abnormalities in the sample in identifying faults in the PV 
system has also attained high accuracies [17]-[20]. Nevertheless, it requires sophisticated tools to generate 
the signal and making it the most expensive method. 

Furthermore, most of the methods/techniques that have been developed in the previous study were 
only to detect specific faults and did not provide fault location. Whereas finding the location of the fault is 
always challenging and time-consuming for large-scale PV systems [21], [22]. Apart from developing 
previous fault detection methods mostly only tested or evaluated on small-scale PV arrays/systems, they did 
not examine the PV fault detection methods on the maintenance aspects. Studies have found a good 
maintenance system is important for inspecting and performing corrective work because different incidents 
or failures have different characteristics that require specialized competent people, different tools and 
techniques to deal with and implement corrections [23]. 

Hence, in this paper, we developed the multi-scale dual-stage (MsDs) model for PV fault detection, 
classification, and monitoring technique, which requires a low implementation cost, can detect multiple faults 
with fault locations, and can be applied to all PV array scales, also useful for PV maintenance works. The 
dual-stage algorithm comprises of fault detection algorithm at stage-1 and fault classification and location at 
stage-2. The MsDs has employed supervised learning techniques of discriminate analysis (DA), k-nearest 
neighbor (KNN), support vector machine (SVM), random forest (RF) in identifying the best algorithm which 
produces the best accuracy. 

The remaining part of this paper is organized as follows: i) Section 2 describes PV array modeling 
and simulation processes; ii) After that, section 3 presents the proposed MsDs technique; iii) Then, section 4 
provides the simulation and testing algorithm's results, discussion, and limitations; and iv) Finally, section 5 
presents the conclusion and recommendation for future work direction. 


2. PV ARRAY MODELING AND SIMULATION 
2.1. Model and input data for solar cells 

A one-diode model (ODM) is chosen in the study to develop PV array modeling because of its 
advantages compared to the double-diode model. It has good accuracy for steady-state conditions and faults 
analysis for PV systems. Further, ODM parameters for PV modules are available for most PV modules in the 
market and are the most commonly chosen model by researchers [24]. The equivalent circuit for ODM and 
its parameter is shown in Figure 1. 


Figure 1. An equivalent circuit of a one-diode model with five parameters 


By using Kirchhoff's law, the output current J in (A) of the PV cell is formulated as given by (1), 
(2), and (3), where the J; represents light-generated current, while the p represents the diode current and Isn 
represents the shunt resistance current. 
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Where q is the electron charge (1.6310! C), A is the diode ideal factor, T is the ambient temperature (K), V is 
the solar cell voltage, and k is the boltzmann's constant (1.38 31073 J/K), the polycrystalline silicon PV 
modules model solartech energy ASC-6P-48-200 is taken for practical comparison. The value of input 
parameters of open-circuit voltage (Voc), short-circuits current (Isc), series resistance (Rs), and shunt 
resistance (Rsh) is obtained from the PV manufacturer's datasheet as in Table 1. 


Table 1. Solartech energy ASC-6P-48-200 PV module parameters data 


Parameter Symbol Value 
Maximum Power Pmpp 199.998 W 
Open Circuit Voltage Voc 30.12 V 
Maximum Power Voltage Vmp 24.6 V 
Short Circuit Current Isc 8.63 A 
Maximum Power Current Impp 8.13 A 
Light-generated current IL 8.6789 A 
Diode saturation current Io 2.929 x 10A 
Diode ideality factor N 1.0136 
Shunt resistance Rsh 210.82 Q 
Series resistance Rs 0.223 Q 
Isc Temperature coefficient a 0.06 
Voc Temperature coefficient B -0.35999 
Solar cell number in series n 48 


2.2. Simulation procedure using MATLAB/Simulink 

Modeling, simulation process, and development of PV array fault detection and classification 
algorithm are by using MATLAB/Simulink. Using simulation data can produce a more precise algorithm. In 
addition, it is useful for an unavoidable restriction in pandemic situations and the inevitable constraints with 
the impossibility of external operational irradiation to obtain data from the actual working conditions of the 
PV system. Six common faults or abnormal conditions of PV array, namely, degradation array (DF), open- 
circuit fault (OCF), line-line fault (LLF), ground fault (GF), partial shading (PS) condition, and faulty 
module (FM), were explored respectively in this study. A PV module consists of several solar cells with 
identical parameters, as shown in Figure 2. Several PV modules/panels were then used to build PV arrays, a 
modified version adopted in [8]. In this study, the small-scale PV array model was configured as five parallel 
PV strings of six in a series (5*6) of PV modules, as presented in Figure 3. 

The simulation process assumed PV array is the only source of the fault current, and there is no 
overcurrent or overvoltage from external sources. A Simulink model of the I-V testing circuit configured was 
to generate the I-V curves and simulated data (power, voltage, and current) from the PV array models, as 
presented in Figure 4. This paper does not present PV array models for the six common fault simulations 
individually to save space. The models are combined on one diagram for description, as shown in Figure 5. 


Solar Cell FAN Diode 
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Figure 2. MATLAB/Simulink of the one-diode model module with solar cell 
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Figure 3. MATLAB/Simulink of 5*6 small-scale PV array model 
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Figure 5. Description of six PV array faults simulation model 


These six PV array fault models were simulated and tested under standard test conditions (STC) 
with radiation at 1000 W/m? and a module temperature of 25 °C. The simulation processes were carried out as: 
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i) Simulation of the LLF model was performed by short-circuiting two different potential points in the PV 
array string. This simulation assumes that the fault impedance is zero, and the LLF at a large voltage 
difference fault was considered. 

ii) Simulation of the GF model was achieved by extending the LLF model by connecting to the ground to 
create a fault current. 

iii) Simulation of the PS model was carried out by setting PS Gain connected to PV modules to less than 1 to 
reduce the irradiance value received by the module(s) to less than 1000 W/m?. 

iv) Simulation of the OCF model was performed by adding an Rs to a PV array string, and Rs was set to 
infinity. 

v) Simulation of the FM model was accomplished by reversing the bypass diode of the solar cell. 

vi) Simulation of the DF model was performed by adding Rs to the PV array and gradually increasing the 
value of Rs. 


2.3. PV array model validation 

Figures 6(a) and 6(b) show the I-V and P-V curves generated from the simulation process under 
STC for a small-scale (5*6) PV array model. The developed PV array model was validated by comparing the 
simulation results of maximum power (Pmax), Voc, and Isc with the PV module manufacturer's datasheet 
available in the market. This study chose solartech energy ASC-6P-48-200 PV module. 
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Figure 6. Simulation process of (a) I-V curve of PV array model and (b) P-V curve of PV array model 


It can be seen that the simulation results are closely matched with the datasheet, as shown in 
Table 2. Therefore, this can be concluded that the proposed PV array model is accurate enough to predict the 
performance of the PV array under normal and fault conditions in this study. 


Table 2. Comparison of simulation results (small-scale PV model) with actual PV module datasheet 


Parameters Solartech Energy ASC-6P-48-200 Simulated Data 
Value of one module __Total of 5*6 PV array model __Value of one module _ Total of 5*6 PV array model 
Pmax 199.988 W 5999.94 W 200 W 6000 W 
Voc 30.12 V 180.72 V 30.12 V 180.72 V 
Isc 8.63 A 43.15 A 8.63 A 43.15 A 


3. MULTI-SCALE DUAL-STAGE FAULT DETECTION AND CLASSIFICATION ALGORITHM 
3.1. Medium-scale and big-scale PV array model: modeling and simulation 

Medium-scale and big-scale PV array models were constructed through MATLAB/Simulink by 
adding panels in series and parallel strings, as shown in Figures 7 and 8. The data for input parameters; Voc, 
Isc, Rs, and Rsh for the medium-scale and big-scale PV array models were also from the PV manufacturer's 
datasheet, model solartech energy ASC-6P-48-200, as listed in Table 1. While, the simulation results of the 
medium-scale and big-scale PV array models under STC is presented in Table 3. 
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Figure 7. Model of medium-scale (10*30) PV array Figure 8. Model of big-scale (20*30) PV array 


Table 3. Comparison of simulation results (medium-scale and big-scale PV model) with actual 
PV module datasheet 


Parameters Solartech Energy ASC-6P-48-200 Simulated Data of PV array model 
Value of one Total of (10*30) Total of (20*30) Medium-scale Big-scale 
module module module (10*30) (20*30) 
Pmax 199.99 W 59.99 kW 119.99 kW 60 kW 120 kW 
Voc 30.12 V 903.7 V 903.7 V 903.7 V 903.7 V 
Isc 8.63 A 86.3 A 172.6 A 86.3 A 172.6 A 


It can be seen that the value of Pmax, Voc, and Isc generated are almost the same as the datasheet of 
solartech energy model ASC-6P-48-200, listed in Table 1. Hence, it can be concluded that the proposed 
medium-scale and large-scale PV array models are precise enough to predict their performance under normal 
and fault conditions in this study. The rest of simulation processes for medium and large-scale PV array fault 
models (LLF, GF, PS, OCF, FM, and DF) were carried out with the same procedure as the small-scale PV 
array model. 


3.2. Fault detection and classification algorithm procedures 

Figures 9(a) and 9(b) show the flowchart of the multi-scale dual-stage (MsDs) PV fault detection, 
classification, and monitoring technique procedures. The MsDs procedure consists of stage-1 and stage-2. A 
(PV_nofault) represents the PV array no-fault model, and (PV_fault”) represents the PV array fault 
models of DF, FM, GF, LLF, OCF, and PS. The flowchart of stage-1 describes the fault detection algorithm. 
Due to the non-uniform PV output characteristics, a simple PV fault detection algorithm has been developed 
to compare power, voltage, and current generated from the PV array fault-free model, which is higher than 
the PV array fault model [25]. 

The parameter chosen for the testing of the detection algorithm is a difference in open-circuit 
voltage (RVoc), a standard deviation of output power (stdP), and mean output voltage and current (uV & ul). 
The fault detection algorithm was tested using four different supervised learning algorithms of DA, RF, 
KNN, and SVM through MATLAB Simulink to acquire the best detection accuracy. Then, the testing 
procedure of this fault detection algorithm was repeated and evaluated on medium and big-scale PV array 
models built in this study to validate its practicality as a multi-scale PV array fault detection algorithm. 
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Figure 9. The flowchart of the (a) MsDs fault detection algorithm (stage-1) and 


(b) the MsDs fault classification and location algorithm (stage-2) 


The flow chart for the stage-2 describes the testing algorithm procedure for the classification and 
location of faults. The stage-2 procedure can proceed if any faults are detected at stage-1. In this study, the 
algorithm of classification and location was tested and evaluated with the following processes: 

i) The testing algorithm involved 15 data sets for PV_nofault model and 15 data sets for each GF, LLF, 
GA, OCF, DF, and PS PV_fault” models. 

ii) Fourteen feature vectors/input parameters of Pmax, Isc, Voc, Rs, RVoc, uV, uI, root mean square voltage 
and current (rmsV, rmsI), variance voltage and current (varV, varl), and standard deviations of power, 
voltage and current (stdP, stdV, stdI) were selected for the testing algorithm because they have been 
proven to produce good accuracy for PV system/array fault detection and classification [21], [26], [27]. 

iii) The testing algorithm used four ML algorithms, DA, RF, KNN, and SVM, to obtain the best algorithm 
and produce the best classification accuracy. 
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iv) The K-fold cross-validation method was adopted in the testing algorithm to optimize the parameter 
chosen and improve the classification accuracy. 

v) The testing algorithm procedure at stage-2 was repeated and evaluated on medium-scale and large-scale 
PV array models to establish its feasibility as a multi-scale PV array fault classification and monitoring 
algorithm. 


4. RESULT AND DISCUSSION 
4.1. Simulation results and analysis of fault detection algorithm 
Figure 10 shows the I-V curves generated from the simulation process using MATLAB Simulink for 
the small-scale PV array models that illustrates the relationship between the output voltage and the output 
current yielded. From the figure, it can be observed that the I-V curves generated from the simulation of 
PV_nofault model, and six PV_fault” models, having different characteristics as: 
i) The ISC and the VOC remain unchanged for PS simulation while the Pmax decreases. 
ii) For OCF simulations, the Isc and Pmax value decreases while the Voc remains unchanged. 
iii) For the FM simulation, the Isc remains unchanged, while the Voc and the gradient of the end part of the I- 
V curve also decrease. 
iv) For the DF array simulation, the Voc value remains unchanged. But the Isc experienced a slight decrease, 
and the overall slope of the I-V curve decreased. 
v) For GF simulations, the Voc value increases, while other characteristics of I-V curves remain unchanged. 
vi) For LLF simulations, the Voc decreases while the Isc remains unchanged, there is no significant change 
in the remaining characteristics of I-V curves. 
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Figure 10. I-V curves of the fault and no-fault models small-scale PV arrays 


Table 4 shows the testing results for the fault detection algorithm accuracies of the proposed small- 
scale, medium-scale, and big-scale PV array models. It can be seen that the fault detection method based on 
the RF algorithm has successfully acquired 100% accuracy for all PV array models. Other algorithms also 
achieved good accuracies, with more than 96%. 


Table 4. Fault detection accuracies (small, medium & big scales PV model) using four ML algorithms 


Algorithm type Fault detection accuracy (%) 
Small-scale PV model ___ Medium-scale PV model Big-scale PV model 
Discrimination Analysis 99 99 98 
Random Forest 100 100 100 
K-nearest neighbours 96 99 98 
Support Vector Machine 97 99 100 


4.2. Simulation results and analysis of fault classification algorithm 

Table 5 presents the classification accuracy for small-scale, medium-scale, and large-scale PV array 
models developed in this study. It can be seen that the classification accuracy based on the RF algorithm is 
the highest compared to other algorithms, with an accuracy of more than 90% for medium and large scales 
PV models and almost 80% for small-scale PV models. Meanwhile, the accuracy of the testing RF algorithm 
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for fault classification and location (module/string/array) for the six faults, DF, FM, GF, LLF, OCF, and PS 
of the small-scale, medium-scale, and big-scale PV array models, can be seen in Table 6. 


Table 5. Fault classification accuracies (small, medium & big scales PV model) using four ML algorithms 


Algorithm type Fault classification accuracy (%) 
Small-scale PV model Medium-scale PV model Big-scale PV model 
Discrimination Analysis 82 90 86 
Random Forest 78 90 93.3 
K-nearest neighbours 48 53 55 
Support Vector Machine 70 89 71 


Table 6. Fault classification accuracies of six faults for small, medium, and big-scale PV model using RF algorithm 


Fault Fault classification and location accuracy (%) 
Small-scale PV model Medium-scale PV model Big-scale PV model 

DF (array) 93.3 93.3 86.7 
FM (module) 93.3 93.3 80.0 
GF (string) 50.4 100 100 
LLF (string) 53.3 100 93.3 
OCF (string) 86.7 80.0 93.3 
PS (array) 93.3 80.0 100 


It can be seen that the fault classification method based on the RF algorithm has succeeded in 
achieving high accuracy. Almost all fault types for all PV model scales achieve more than 90% classification 
accuracy, and for DF and FM (large-scale), OCF and PS (medium-scale), and OCF for small-scale get more than 
80% classification accuracy. Only the fault classification for GF and LLF (small scale) achieved low accuracy. 


4.3. Discussion 

This study has developed and simulated MsDs algorithms for PV array fault detection, 
classification, and location via MATLAB/Simulink, which consists of stage-1 (fault detection algorithm), and 
stage-2 (fault classification and location algorithm). Although the I-V curves generated from the simulation 
process of the PV no-fault (PV_nofault) model, and the six PV fault (PV_fault”) models have shared the 
same characteristics of Voc, Isc, and Pmax (Figure 10). High accuracies were accomplished when the 
developed fault detection algorithm was tested using four different supervised learning algorithms; DA, RF, 

KNN, and SVM. The RF algorithm has achieved 100% accuracy for all scales of PV array models, as can be 

seen in Table 4. 

For fault classification at stage-2, the RF algorithm has again achieved high accuracy for medium- 

scale and large-scale PV array models (more than 90%) compared to other algorithms, as shown in Table 5. 

Only for small-scale PV array has produced modest accuracy. However, if we look at the accuracies of fault 

classification and location for each PV_fault" model as presented in Table 6, the RF algorithm with the 

combination of k-fold cross validation has delivered high accuracy for almost all PV array fault models 

(more than 90%), except for GF and LLF on the small-scale PV array achieved the low accuracy values. This 

might be due to the low discrimination power of ML algorithms in describing the faults, thus resulting in 

poor performance. In summary, the proposed MsDs has the following research contributions over earlier 

works [8], [28]: 

i) It is low-cost and inexpensive modeling. The fault detection algorithm with the k-fold cross validation at 
stage-1 has proven to detect multiple common faults; GF, LLF, PS, OCF, DF, and PS in PV arrays with 
good accuracy and without interruption to the system. 

ii) The classification and location algorithm with the k-fold cross validation at stage-2 can identify faults at 
different locations; at the string, module, or array level, useful for large-scale PV systems/plants, and 
achieved good accuracies. 

iii) The study has proven that the developed algorithms are easy to execute and feasible to apply to all PV 
array scales globally regardless of environmental conditions. 

iv) A simple fault detection algorithm at level-1 is beneficial for preventive and predictive maintenance in 
finding hidden faults in PV systems that CPD cannot detect. The hidden faults can reduce the system's 
efficiency and cause worse circumstances such as fire hazards, injuries, and electric shocks to the PV 
system operator. 
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4.4. Limitations 

This study has some limitations; the MsDs technique has been tested using simulated data only. 
Accuracy may vary when this proposed algorithm is implemented on an actual operating PV system. Other 
than that, the MsDs algorithm was tested using supervised learning algorithms, in which fully labeled data 
were used. But for MsDs to be used for maintenance work with unlabeled data, the accuracy of the proposed 
algorithm is not verified. Lastly, the algorithm for fault classification and location tested on small-scale PV 
array models has shown moderate accuracies for GF (string) and LLF (string). 


5. CONCLUSION AND FUTURE WORK DIRECTION 

This study proposed a multi-scale dual-stage (MsDs) model for PV array fault detection, 
classification, and monitoring technique that have demonstrated good accuracy. The MsDs consists of the PV 
array fault detection algorithm at stage-1 and the PV array fault classification and location algorithm at 
stage-2. The MsDs algorithms have been tested using four supervised learning algorithms; Discriminate 
analysis (DA), K-nearest neighbor (KNN), Support vector machine (SVM), and Random Forest (RF), 
together with k-fold cross-validation in finding the best algorithm that delivers the best accuracy. Further, 
MsDs have also been evaluated on small, medium, and large-scale PV array models to ascertain their 
feasibility on multi-scale PV arrays. 

The simulation results have proved that the RF algorithm has accomplished the best accuracy for 
both medium-scale and big-scale PV array models, with 100% accuracy for various faults (open-circuit fault, 
degradation array, partial shading, faulty module, ground fault, and line-line fault) detection, and more than 
90% accuracy for fault classification and location, excluding for model of a ground fault and line-line fault of 
a small-scale PV array that produced low classification accuracy values. Overall, the simulation results have 
justified the study's objectives to develop a low-cost model for PV arrays with various fault detection and 
classification algorithms that can be implemented at various PV array scales and applicable for PV 
maintenance works for better efficiency, reliability, and security of the PV system. 

Nevertheless, some recommendations can be carried out for future work. It is recommended to 
validate the proposed MsDs technique by testing developed algorithms using data from the real PV system. 
This ensures the accuracy of the results obtained from the developed PV array model and the actual PV 
system. Furthermore, the MsDs testing algorithm in this study applied supervised learning algorithms, in 
which fully labeled data was used. Thus, the testing algorithms need to be evaluated on unlabeled data to 
obtain more precise accuracy and verify the feasibility of MsDs for PV system maintenance work. Finally, 
more training and testing need to be done on the classification and location algorithm for the small-scale PV 
array models' line-line and ground faults to improve accuracy. 
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